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Abstract 

We consider the problem of learning the inhomogeneous intensity of a counting 
process, under a sparse segmentation assumption. We introduce a weighted total- 
variation penalization, using data-driven weights that correctly scale the penalization 
along the observation interval. We prove that this leads to a sharp tuning of the convex 
relaxation of the segmentation prior, by stating oracle inequalities with fast rates of 
convergence, and consistency for change-points detection. This provides first theoret¬ 
ical guarantees for segmentation with a convex proxy beyond the standard i.i.d signal 
-I- white noise setting. We introduce a fast algorithm to solve this convex problem. 
Numerical experiments illustrate our approach on simulated and on a high-frequency 
genomics dataset. 
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1 Introduction 

Counting processes are widely used in engineering to describe systems where stochas¬ 
tic events occur, such as genomics, biology, econometrics, communications and networks, 
see [2]. In these problems, the aim is to estimate the intensity function, which determines 
the instantaneous rate of occurrence of an event. In the statistical literature, this topic 
has been extensively discussed in several previous works. Procedures based on kernel esti¬ 
mation [29], cross-validation [20], wavelet methods [26], local polynomial estimators [13], 
model selection [30] are considered for the non-parametric estimation of the intensity. 

In this paper, we want to recover the intensity Xo{t) of a counting process {N{t),t G 
[0,1]} from n observations of N. We work under the assumption that Aq can be well- 
approximated by a piecewise constant function, and we deal with this problem with a 
signal segmentation point-of-view, where the goal is to find the unknown times of abrupt 
changes in the dynamic of the signal. This is referred to multiple change-point problem 
in statistical literature, see [24] for a recent review with interesting references. A change- 
point is a time or position where the structure of the object changes and the goal of 
change-point detection is to estimate these positions. 

Several examples of practical importance fulfill the model of multiple change-points. A 
particularly interesting example comes from the next-generation sequencing (NGS) DNA 
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process. Indeed, an important application of NGS technologies is the study of the tran- 
scriptome and the resulting experiment is called RNA-seq. In a typical RNA-seq exper¬ 
iment, a sample of RNA is amplified, shattered, and converted to a library of a cDNA 
fragments. Then, it is sequenced on a high-throughput platform which is available com¬ 
mercially. Finally, the raw data result in large amounts of DNA fragments sequences called 
reads. These reads are then mapped to the reference genome by an appropriate algorithm, 
that tells us the region from which each read comes from. RNA-seq can be modelled math¬ 
ematically as replications of an inhomogeneous counting process with a piecewise constant 
intensity [32] . The counting process counts the number of reads whose first base maps to 
the left base of a given chromosome’s location. In [32], a Bayesian approach for the de¬ 
tection of change-points is considered. Other approaches based on Bayesian model-based 
clustering and segmentation are given in [27]. 

In the present paper, we consider the estimation of tq/ and /3o,£ in the following model: 



( 1 ) 


for 0 < t < 1, with the convention ro,o = 0 and = 1. Our approach consists in 

reframing this task as a variable selection task. We introduce a penalized least-squares 
criterion with a data-driven total-variation penalization, which is £i-penalization of the 
discrete gradient of the parameter. 

This convex proxy for segmentation with an extra .^i-penalization for sparsity, called 
fused Lasso, is introduced in [33]. Theoretical guarantees for this procedure are given 
in [21] in the white noise setting, for the segmentation of a one-dimensional signal. A 
group fused Lasso is introduced in [7] for the detection of multiple change-points shared 
by a set of co-occurring one-dimensional signals, and an algorithm is derived to solve the 
corresponding convex problem. The determination of the number of structural changes in 
multitask learning via the group fused Lasso is considered in [28]. 

Beyond the one-dimensional setting, total-variation penalization is well-known and 
commonly used in image denoising, deblurring and segmentation, see for instance [12] 
and [11]. In this context, one needs to define a graph of neighboring nodes (pixels), and 
the problem can be solved efficiently by reformulating it as a min-cut problem and solving 
it using a max-flow algorithm [22]. 

Other close references are the following: [18] proves sharp oracle inequalities for the 
Lasso in hazards models, [15] studies Lasso-type estimators in a linear regression model 
with multiple change-points, [31] considers denoising of a sparse and block signal, [9] stud¬ 
ies the asymptotics for jump-penalized least squares regression aiming at approximating 
a regression function by piecewise constant functions. An algorithm of majorization- 
minimization for high dimensional fused Lasso regression is proposed in [35], a testing 
approach for the segmentation of the hazard function is given in [19]. 

The papers [30], [33], [21], [7], [28], are most relevant to our work. In [30], a model 
selection procedure is introduced to estimate the intensity function. In [21] and [33], 
the authors propose an adaptation of the Lasso algorithm to detect change-points in the 
standard i.i.d signal -|- Gaussian white noise framework. In [7] and [28], the authors use 
group fused Lasso to solve the structural change-points in linear regression problems. This 
paper is different from these works in the following aspects. First, a main feature of our 
results is that they are derived for a signal in continuous time, as compared to [21], [28] 
and [33]. Namely, we aim at detecting change-points in the intensity function. Hence, this 
problem is prone to an unavoidable non-parametric bias of approximation by a piecewise 
constant function, which makes our mathematical analysis very different. A second main 
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feature of our results is that we introduce a weighted total-variation penalization, using 
data-driven weights that correctly scale the penalization along the observation interval. 
This is not necessary in the Gaussian and discrete signal -|- noise setting from [21] for 
instance. As a side product, we are able to use the same tuning parameters both for 
consistency in oracle inequalities, see Theorems 1 and 2, and detection of change-points, see 
Theorems 3 and 4. A third main feature of our approach is that we use a convex surrogate 
for the sparsity of the discrete gradient of the signal, that can be solved numerically 
very efficiently, see Section 5, even for a large signal (using many bins). This is not the 
case for the approach described in [30], which is based on 1^ model-selection techniques. 
Furthermore, our oracle inequalities are sharp in the sense that the leading constant in 
front of the bias terms is equal to one. 

The rest of the paper is organized as follows. In Section 2, we provide basic notations. 
Then, we present our estimation procedure. Section 3 develops oracle inequalities for the 
estimator, see Theorems 1 and 2. Section 4 gives results in change-points detection, see 
Theorems 3 and 4. Section 5 describes a fast algorithm to solve the convex problem studied 
in the paper. The proofs of the main statements are gathered in Sections 7, 8 and 9. 

2 Counting processes with a sparse segmentation prior 

Let (0,7^, P) be a probability space and (T))o<t<i a filtration satisfying the usual condi¬ 
tions [25]: increasing, right-continuous and complete. A counting process is a stochastic 
process {N{t)}o<t<i which is (T))-adapted to the filtration, with right-continuous and 
piecewise constant paths almost surely (a.s.), with jump of size -|-1 at event times such 
that N{0) = 0 and N{t) < oo a.s. The term counting process is natural; N{t) — N{s) 
corresponds to the number of events of a certain type occurring in the interval {s,t]. The 
Poisson process is the most common example of a counting process, where the jumps oc¬ 
cur randomly and independently of each other on disjoint intervals, see for instance [10] 
and [23] for references on point processes and their statistical estimation. 

Since N is increasing, it is a submartingale, so it follows from the Doob-Meyer decom¬ 
position theorem [1]. Namely, N = Aq M, where Aq is a predictable increasing process 
called the compensator of N and M is a (T))-martingale. We assume in the following that 

Ao{t) = nN{t)]= [\o{s)ds (2) 

Jo 

for 0 < t < 1, where Aq is a non-negative right-continuous function with left-hand limits 
called intensity rate of N. Under this assumption, M{t) = N{t) — Xo{s)ds is a local 
square-integrable martingale with quadratic variation given by {M){t) = fg Ao(s)ds and 
optional variation \M]{t) = fg XQ{s)dN{s). 

2.1 Sparse segmentation assumption 

We work under the assumption that the intensity is piecewise constant, over unknown 
inhomogeneous intervals of time. From now on, stands for the indicator function of a 
set A. For some results in the paper, we will use 

Assumption 1. We assume that the intensity writes 

Lo 

^o{t) = 0 < t < 1, 

i=i 


( 3 ) 
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with Lq > 1, are positive coefficients, and where Jq = {0}, for 

i = l,...,Lo and ro,o = 0 < ro,i < • • • < tq^Lq-i < to,Lo = 1- 

Assumption 1 means that Lq — 1 changes affect the value of Aq at unknown instants 
To,£. The number of change-points Lq — 1 is unknown. In this setting, we want to recover 
the intensity Aq, by jointly estimating Lq, tq^i and Pq/, for £ = 1,..., Lq — 1. Throughout 
the paper, we will assume the following. 

Assumption 2. We observe n i.i.d copies of N on [0, 1], denoted Ni, ..., Nn- 

The assumption that the process is in [0,1] is for the sake of simplicity. Assumption 2 
is equivalent to observing a single process N with intensity uXq, which is only used to 
have a notion of growing observations with an increasing n. 


2.2 A procedure based on total-variation penalization 

Fix m = rUn > 1, an integer that shall go to infinity as n —?• oo. Let us define the set of 
nonnegative piecewise constant functions on [0,1] given by 

m 

Am — ~ ^ ^ ffi,mXj^rn • P = \l^j,'ni\l<j<m £ 1^^ j" j (4) 

J = 1 


where 


Aj,m — 


and 



y 

m. 


The linear space Am is endowed by the norm ||A|| = (/J^ A^(t)cit)^/^. We introduce the 
least-squares functional 


Rn{X) 



2 ^ 

X{t)dNi{t), 

^ ^=l 40 


which is the goodness-of-fit criterion to be used in this setting, see among others [30]. 
Note that {Aj^m ■ j = produces an orthonormal basis of Am, it implies that 

m ^ !— m n 

Rn{Xp) = ^ /3|m - ^ ^ ldj,mNiiIj,m,) 

j=l j=l i=l 

for any /3 G M!J!. Now, let us introduce the weighted total-variation penalization 

m 

Wh\,w = (5) 

i=2 

for j3 = [ldj\i<j<m £ 1^™, where w = [wj\i<j<m is a positive vector of weights (eventually 
depending on data) to be defined later on, with wi = 0. The data-driven weights w will 
allow to design sharp tuning of the total-variation penalization. Then, given m > 1 and a 
weights vector w, we introduce 

/3 = argmin^gjjm {4?n(A^) -h ||/3 ||tv,*}, (6) 


hence an estimator of Aq is given by A = A^. An estimation of the change-point locations 
is obtained from the support of the discrete gradient of jS. Namely, define 

S ~ {j '• for j = 2, , 771 j", 


(7) 
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and denote by L = |S| the estimated number of change-points. 

We denote the mean counting process iV„ = n~^ Yl'i=i unweighted TV 

penalization by ||/3 ||tv = YIJL 2 \l^j ~ ^ notation iV„(/) = 

fj dNn{t) for any I C [0,1]. 


3 Sharp oracle inequalities 

In this section we address the statistical properties of A stated in (6), by proving two oracle 
inequalities. Theorem 1 below is an oracle inequality of “slow-type” [6] that holds in full 
generality, while Theorem 2 is a fast oracle inequality, that holds under the assumption 
that the number of the estimated change-points is upper bounded by a known constant 
Tmax- Both oracle inequalities are sharp in the sense that the constant term in front of 
the oracle term inf^ HA^g — A|| is equal to one. 

Theorem 1. Fix x > 0 and introduce the data-driven weights, 


Wj = 5.66t 


l m{x -\- \ogm + hn,x,j)Vj ^ g y/m{x -H -Hogm hn,x,j) 


n 


n 


where Vj = l]) and 


h 


"n,x,j 


= 2 log log 


6enVj 14e(x -|- log m) 
28(x -I- logm) 


Ve 


Then, if X is given by (6), we have 

||A - Aoll^ < ^mf^ (|||A/3 - Aoll^ -k 2||/3 ||tv,u>) (8) 


with a probability larger than 1 — 12.85e 

The proof of Theorem 1 is postponed in Section 7. We define /3o,m = [/3o,i,m]i<j<m the 

coefficients vector of the projection of Aq on Am and A/^ rnax = rnax |/3o/ — /3o,^'|, which 

l<£,£'<Lo 

is the maximum jump size of Aq. Under Assumption 1, a control of the approximation 
term leads to the following. 


Corollary 1. Given Assumption 1, and under the same assumptions as the ones from 
Theorem 1, we have 


||A — AolP < 


2(^0 - 1 )^ 1 , 


+ 2 


m 


TV max w-j 


(9) 


The proof of Corollary 1 is given in Section 7. Theorem 1 uses a data-driven weighting 
of the TV penalization, based on weights roughly given by 


Wi 


m log m - 

dy T) 


n 


3 -1 


m 


( 10 ) 


This exhibits a new scaling of the TV penalization, which is natural and of importance in 
this setting. The shape of this data-driven weighting comes from a Bernstein’s concentra¬ 
tion with data-driven variance, necessary for the control of the noise term (a martingale 
with jumps), given in Proposition 1 below, see Section 7.1. 
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Theorem 2. Fixx > 0 and let A be the same as in Theorem 1. Assume that the estimated 
number of change-points L satisfies L < Lmax- Then, we have 


||A-Ao|P< inf ||A/3 - Ao||^ + 6(Lmax + 2(-Lo - 1)) max wfi 


ll'^olloo “1“ -^'max (1 + logm)) 

+ Ai ---— 

n 

m(x + Lmax(l + logm))^ 

+ A2-2-’ 


( 11 ) 


with a probability larger than 1 — AmaxC ^, with ||Ao||oo = supjg[o,i] Ki = 1670.89, 

and K 2 = 6683.53. 


The proof of Theorem 2 is provided in Section 7. This results proves that our procedure 
has a fast rate of convergence of order 

(Tmax V Lo)mlogm 
n 


which scales in m/n. 

Corollary 2. Given Assumption 1, and under the same assumptions as the ones from 
Theorem 2, we have 


||A - AolP ^ ■ + 6(Lmax + 2(To - 1)) max w‘] 


m 


l<j<m 


+ Ki 


+ K 2 


) (x + (1 + logm)) 

n 

m{x + Lmax(l + logm))^ 


( 12 ) 




with a probability larger than 1 — L m^^ e ^, with the same notations as in Theorem 2. 

The proof of Corollary 2 is presented in Section 7. A consequence of Corollary 2 is 
that an optimal tradeoff between approximation and complexity is given by the choice 
m ~ Note that we are able to use the same procedure in Theorems 1 and 2, namely 
for the slow and fast rate, while it is not the case in the signal + white noise considered 
in [21] for instance. 


4 Change-point detection 


In this section we prove that the proposed total-variation with data-driven weights pro¬ 
cedure is consistent for the estimation of the change-point positions. Note that, however, 
the context considered here is quite different from the more standard signal -|- white 
noise setting: here we aim at detecting change-points in the intensity function, hence this 
problem is prone to an unavoidable non-parametric bias of approximation by a piecewise 
constant function. This means that we will not be able to recover the exact position of 
two change-points if they lie on the same interval 7j,m- Therefore, we assume 


Assumption 3. 

such that 


Grant Assumption 1 and assume that there is a positive constant c > 8 


min 

i<£<Lo 


- T 0 /-I 


c 

> 


m 


(13) 
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This assumption entails that the change-points of Aq are sufficiently far apart, and 
that, in particular, there cannot be more than one change-point in the “high-resolution” 
intervals Ij^m- Under Assumption 3, the procedure will be able to recover the (unique) 
intervals for £ = 0,... ,-Lo, where the change-point belongs. Hence, we define the 

approximate change-points sequence [j7]o<r<Lo follows. 

Definition 1. The approximate change-points sequence [j(\o<i<Lo relative to the level of 
resolution m is defined as the right-hand side boundary of the unique interval Iji,m that 
contains the change-point tq/, namely 


To/ G 


ne - 1 fi' 

\ m ^ m. 


(14) 


for i = 1,..., Lq — 1, where we put jo = 0 and jig = m by convention. 

Given the support S = {ju • • •, with Ji < • • • < ji of the discrete gradient of j3 
dehned in (7), and introducing jo = 0 and ji_^i = m, we define simply 


h = 


k 

m 


(15) 


for ^ = 0,..., L -|- 1. In order to be able to prove a consistency results for change-points 
detection, we need a set of assumptions that quantifies the asymptotic interplay between 
several quantities: 

• Aj^min = mill \ji-\-i “Jfl) which is the minimum distance between two consecutive 

l<£<Lo —1 

terms in the change-points of Aq. 

• = min |/3o,q+i,m—/3o,(7,m|) which is the smallest jump size of the projection 

’ l<g<m—1 ’ ’ ’ ’ 

'^o,m of Aq onto A^. 

• kn)n>i, a non-increasing and positive sequence that goes to zero as n —)■ oo, and 
such that men > 6 for any n > 1. 


Assumption 4. We assume that Aj^niin, A/^ m:n and (en)n>i satisfy 


-y/log m 
k m log m 


oo 


oo 


as n ^ oo. 


(16) 

(17) 


This assumption controls the rate (e„) of convergence of fi towards To/. The logarith¬ 
mic factor is due to concentration inequalities for the control of the noise (the martingale 
M obtained by compensation of N). The next Theorem proves the consistency of our 
procedure for the detection of change-points, under the assumption that the estimated 
number of change-points is the correct one. 

Theorem 3. Under Assumptions 3 and f, and if L = Lo — 1, then the change-points 
estimators {fi, ... ,fi} given by (15) satisfy 


P 


max |ro r - fe\ < Sn 
i<e<Lo-i ’ 


-}■ 1 


(18) 


as n ^ oo. 










The proof of Theorem 3 is quite involved and is presented in Section 8 and Appendix B. 
It builds upon some techniques developed in [21], based on a careful inspection of the 
Karush-Kuhn-Tucker (KKT) optimality conditions, see for instance [8], for the solutions 
to the convex problem (6). The proof depends also heavily on a data-driven Bernstein’s 
inequality for the control of the martingale errors, see Proposition 1 from Section 7. 

Let us give examples of scaling for the quantities A^^min and (en)n>i that meet 

Assumption 4. Assume for simplicity that 


and n ^ 

for some constants a ,7 > 0 . 

• If m = then Theorem 3 holds with any 0,7 > 0 satisfying 0 < 7 < 1/3 and 
0<a + 7 < 2 / 3 , and if Aj^min > 6 . 

• If m = then Theorem 3 holds with any 0 < 7 < 1/4 and 0 < a-l -7 < 3/4 and 
if ^j,min ^ 6 . 

In order to prove change-point consistency without the assumption that the estimated 
number of change-points is the correct one, we need to relax a little bit the statement of 
the result given in Theorem 3. Namely, we evaluate a non-symmetrized Hausdorff distance 
£{T\\To) between the set of estimated change-points 

t= 

and the set of true change-points 

To = ..., ro,Lo-i}) 

where for two sets A and B, the quantity £{A\\B) is given by 

£{A\\B) = sup inf \a — b\. 

b&B “SA 


Note that £{A\\B)\/£{B\\A) is the Hausdorff distance between A and B. When L = Lq — I, 
Theorem 3 implies that 


P 


£{f\\To) <en,£{ro\\f)<e. 


(19) 


as n —)> 00 . When L > Lq — 1, we prove in Theorem 4 below that £{T\\7o) < £n with a 
probability going to 1 as n —)> 00 . This means that change-point consistency holds for our 
procedure whenever the estimated number of change-points is not less than the true one. 

Theorem 4. Under Assumptions 3 and 4, o,nd if L > Lq — 1, we have 


P 


£{f\\To) < Sr, 


( 20 ) 


as n 


00 . 


Theorem 4 ensures that even when the number of change-points is over-estimated, 
each true change-point is close to the estimated one. The proof of Theorem 4 is given in 
Section 9. It is based, as for the proof of Theorem 3, on a repeated utilization of the KKT 
optimality conditions of problem (6). 

Note that a difference with [21] is that we are able to use the same regularization 
parameters Wj given by (10) in Theorems 3 and 4. Besides, we don’t need an upper bound 
on the estimated number of change-points in Theorem 4, while it is necessary in [28]. 
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5 Numerical experiments 

In this section we propose a fast algorithm for solving the optimization problem (6) and 
apply it on simulated and real datasets from genomics. 

5.1 Algorithm 

A concept of importance for convex optimization in machine learning is the proximal 
operator [3, 4], The proximal operator proxj of a proper, lower semicontinuous, convex 
function / : (— 00 , 00 ], is defined as 

proxj(n) = argmin 3 ,gj|m ^ ^ 

In this section, we provide a fast algorithm to solve the optimization problem (6), that 
computes the proximal operator of the weighted total-variation. 

We observe n i.i.d observations of N over the interval [0,1]. Recall that Nn = 
and Nn{I) = JidNn{t) for any I C [0,1]. We also recall that A(t) = 
(t), where (3 = [f3i,..., Pm] is given by (6). Hence, we have 

P = argmin^gjj- - PWl + ||/3||tv,u;}, (21) 

where N = [Nj]i<j<m G is given by 


,m) 


N = 



Therefore, we see that (21) is equivalent to 

/3 = P™^||.||tv,JN). 

Next, we develop an algorithm that computes Pi'ox||.||^y ^, which is an extension of [17] 
to weighted total-variation. Towards this end, we introduce the following (m — 1) x m 
bidiagonal matrix 

—W2 W2 0 • • • 0 

_ 0 -W 3 W 3 

— 

_ 0 • • • 0 -Wm 

Then, one can express the primal problem (21) as follows: 

P = argmin^gRrn |^||N - P\\l + ||T»^^||i|. (22) 

Essentially, problem (22) is difficult to analyse directly because the nondifferentiable li 
norm is composed with a linear transformation of p. When solving (22) we may consider 
its Fenchel dual form [4]. First, we rewrite the primal problem as 

minimize^g]Rm_^g]Rm-i ^||N - P\\l + ||2:||i 
subject to D^P = z. 


0 
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whose Lagrangian is 

^{/3,z,u) = ^||N - /3\\l + ||z||i +u^{D^/3 - z), 


and to derive a dual problem, we minimize this over /3, z. A straightforward computation 
gives 


min 

/3 


’-||N- 

-2" 


I2 + 


1 

2 


IN-Dlzrl 


2 

2 ) 


while 



if \\u\\oo < 1, 
otherwise. 


Introducing uq = Um = 0, we proved that a dual problem of (22) is given by 


IIL 

minimize„g]Rm+i - ^ (Nfc - Wk+iUk + WkUk-i) , 

^ k=l 

subject to \uj\ <1, for A; = 1,..., m, and uq = Um = 0. 


If we have a feasible dual variable u, we can compute the primal solution /3 using 

= Nfc - Wfc+iUA: + WkUk- 1 , for /c = 1,..., m. (23) 

For this problem, strong duality holds, see [8], meaning that the duality gap is zero. The 
KKT optimality conditions characterize the unique solutions f3 and 9k ■= Wk+iUk- They 
yield, in addition to (23): 

{ 9k e [-Wk+i,Wk+i], if Pk = /3fc+i, 

9k = -Wk+i, if Pk < /3fc+i, (24) 

9k = Wk+i, if/3fc>/3fc+i. 

Therefore, the proposed algorithm consists in running forwardly through the samples 
[Nfc]i<fc<m. Using (24), at location k, 13k stays constant where \9k\ < Wk+i- If this is not 
possible, it goes back to the last location where a jump can be introduced in /3, validates the 
current segment until this location, starts a new segment, and continues. This algorithm 
is described precisely in Algorithm 1. 


5.2 Simulated data 

We conduct simulations on 2 examples of intensities. We simulate counting processes with 
inhomogeneous piecewise intensities Aq, with 5 and 15 change-points, see Figure 1, with 
an increasing sample size n. In order to assess the performance of the total-variation 
procedure A, we use a Monte-Carlo averaged mean integrated squared error (MISE) as a 
performance measure, given by 

MISE(A, Ao) = E / (A(t) - Xo{t)fdt. 

Jo 

We run 100 Monte-Carlo experiments, for an increasing sample size between n = 500 and 
n = 30000, for each 2 examples. In Eigure 2, we plot the MISEs of the weighted and 
the unweighted total-variation (namely w = 1), for the 2 examples, as a function of the 
sample size. We observe in Eigure 2 that the estimation error is always decaying with 
the sample size, and that both procedures behave similarly. Differences can be observed 
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Algorithm 1: /3 = proxy. ^ (N) 

Input: N= (Ni,...,N™)^ = GM™. 

Output: (/3i,... 

1 . Set A: = /co = /c- = A:+ ■(— 1; 

/^min Ni - W2 1 /3max ^ Ni + W2] 

^min W2] 0max ^ -'^2! 

2 . if A: = m then 

\_ /3min “1“ ^minj 

3. if Nfc +1 + ^min < /3min “ Wk+2 then /* negative jump */ 

ho = • • • = h- /3min) 

A; = A:o = A:_ = A:+ •(— A:_ + 1; 

/5min Nfe Wk+l “f /3max ^ Nfc “1“ 'i^k+1 “1“ ^fcj 
_ ^min ^ '^k+lt ^max ^ '^k+lj 


4. else if + ^max ^ /3max “1“ '^k+2 then 

^ko = • • • = Pk+ ^ /3max) 

A; = A:o = A:_ = A:+ <— A:+ + 1; 

/^min ^ ■(/)/£_|_i rtfc, /3max ^ 

_ ^min ^ 'itfc+lj ^max ^ 'itfc+lj 


5. else 

set k i 
On 
On 

if ^min 

/5min 
0^ 


- A: + 1; 

in ^ Nfc + Wk+i ^min: 
ax ^ '^fc+l /^max; 

> Wk+1 then 

/Q I ^min ‘^fc + 1 . 

Pmin -h k-ko-hl ’ 

mm 

k- ^ k; 
if ^'max < -Wk+1 then 

O j _ O I ^max +w'fc+i. 

Pmax ^ Pmax “r ? 

^max ^ '^k+lj 

A;+ <— k; 


/* positive jump */ 


/* no jump *■/ 


6. if k < m then 

L go to 3.; 

7. if Omin < 0 then 

f^ko — ■ ■ ■ — Pk- ^ Pmim 

A: = A:o = A:_ <— A:_ + 1; 

/3min ^ Nfc - Wk+i + Wk] 

^min ^max ^Nk + Wk ^max; 

_ go to 2.; 

8. else if 0max > 0 then 

/5/cq — * * * — ^ /^max; 

k = /cq = /c_j_ ‘i — /c_j_ “t“ 1J 
/^max ^Nfc + Wfc+i - rtfc; 

^max 'ftfc+li ^min Nfc Wk ^min! 

_ go to 2.; 

9. else 

^ Pko = • • • = Pm ^ Prnln + k-kP+^ ’ 
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below, using a genomics datasets. On each simulated dataset, we perform a 10-fold cross- 
validation to select the best constant to use in front of the weights Wj (both for the 
weighted and unweighted total-variation). Cross-validation in this context is achieved by 
choosing uniformly at random a label between 1 and 10 for each point, and by using points 
with label k in the /c-th testing fold and removing these points for the A;-th training fold. 
The estimated intensity is accordingly corrected, by this amount (as removing uniformly 
a fraction of points from a counting process biases downwards the intensity by the same 
fraction). 


1600 

1400 

1200 

1000 

800 

600 

400 

200 

8 



Figure 1: Intensities used for Example 1 (left) and Example 2 (right), respectively with 5 
and 15 change-points 



Figure 2: Average MISEs (bold lines) over 100 Monte-Carlo experiments and standard 
deviations of the MISEs (dashed lines). First: weighted TV for Example 1; Second: non- 
weighted TV for Example 1; Third: weighted TV for Example 2; Fourth: non-weighted 
TV for Example 2 


5.3 Real data 

Our method is illustrated on NCI-60 tumor and normal cell lines, HCC1954 and BL1954. 
This dataset was produced and investigated by [14] using the Illumina platform, where 
the reads are 36bp long. After cleaning of this data, there are 7.72 million reads for the 
tumor (HCC1954) and 6.65 million reads for the normal (BL1954) samples respectively. 
A description of the sampling process for such data is described in Introduction. We 
show in Figures 3 and 4 both tumor and cell lines data. This data consists of a list of 
reads number, see Figure 4, where we plot a zoomed sequence of reads. For visualization 
purposes, we give in Figure 4 the binned counts of reads over 10000 intervals equispaced 
on the range of reads. 

In Figure 5 we plot the best solution of the weighted and unweighted {wj = 1) total- 
variation estimators on the normal and tumor reads data. For easier visualization we 



































13 


180000 



185000 190000 


wjm 


195000 200000 


210000 


Figure 3: A zoom into the sequence of reads for normal (left) and tumor (right) data 



Figure 4: Binned counts of reads (log-scale) of the normal (left) and tumor (right) data 



Figure 5: A zoom between reads number 0 and SOM of the weighted (left) and unweighted 
(right) total-variation estimators applied to the tumor (top) and normal (bottom) data 
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plot a zoom of the reads sequence. We perform a 10-fold cross-validation to select the 
best constant to use in front of the weights Wj (both for the weighted and unweighted 
total-variation), as explained above. We observe in this figure that the weighted total- 
variation gives sharper results: the piecewise constant intensity is smoother, and the 
obtained change-points locations seem, at least visually, better. An important fact is 
that the runtime of Algorithm 1 is extremely fast: a solution is obtained in less than one 
millisecond, on a modern laptop (implementation is done using python with a C extension). 
This is due to the fact that Algorithm 1 is typically linear in the signal size. 

6 Conclusion 

In this work, we prove that convex optimization for the detection of change-points in the 
intensity of a counting process is a powerful tool. We introduce a data-driven weighted 
total-variation penalization for this problem, with sharply tuned regularization parame¬ 
ters, and prove two families of theoretical results: oracles inequalities for the prediction 
error, and consistency in the estimation of change-points. We illustrate numerically our 
approach via simulations and a genomics dataset application. Future directions for this 
work are the study of maximum likelihood estimation instead of least-squares, and a mul¬ 
tivariate extension of the proposed algorithm. 

7 Proof of Theorems 1 and 2 

Introduce /x = [lJ>j]i<j<m £ given by /xi = (3i and Hj = (3j — (3j-i for j = 2,... ,m. 
Then, we have /? = T/x, where T is the m x m lower triangular matrix with entries 
(T)j^fc = 0 if j < /c and {T)j,k = 1 otherwise. Note that /3 = T/x, where 

/X = argmin^g^™ “ '^^112 + ■ (^5) 

i=2 


7.1 Proof of Theorem 1 

This proof follows a standard argument for proving slow oracle inequalities, see for in¬ 
stance [6]. Due to the Doob-Meyer decomposition theorem, we have 


12n(A) = ||A-Aof-IIAof- / X{t)dMn{t), 

Jo 

which leads to 

A = A^ = argmin^gjj™ (l|A/3 - Aof - 2 j Xf}{t)dMn{t) + ||/3||tv,x 
Then, using (6), it implies that 


(26) 


- Aoll^ < inf ||A/3 - AolP -I- ~^n{X - A/?) -|- ||/3||tv,i« - ||/5||TV,'!i 


(27) 
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where t'n(A) = Yl'i=i fo ^{t)dMi{t) is a centered empirical process. Note that 
-Vn[\ - Xp) = - I3j,m) / Xj^m{t)dMn{t) 


1=1 

m 


- (T/r)j,m) / Xj,m{t)dMn{t) 

,=i -^0 

m ^ /*1 

,m) ^ ^ {t)dMn{t). 

3=1 ,=3 -^0 


Define the event by 


m ^ /*1 


^ n (I X] / \,m{t)dMn{t) < y }- 


(28) 


1=1 9=1 

The probabilistic control of is given in Proposition 1 from Section 7 below. It relies on 
a slight modification of an empirical Bernstein inequality from [18], see also [30]. On 11^, 
we have using (28) 

2 - ™ 

l^n(A A^) ^ ^ ^ djj I 

^ 1=1 

Using (27), we obtain 


IIA AqII ^ IIA^ AqII + ^ ^ dbj |/lj',m /ll,m| T ^ ^ 'd]j(\li,j^rn\ |/ll,m|) 

1=1 1=1 
m 

— ||A^ — AqII +2 ^ ^ Wj I /^j,m I 

1=1 

= ||A^ - Aoll^ + 2||/3||Tv,tu- 

Then, on (8) in Theorem 1 holds true . It remains now to control P(D^). We have, 
recalling Xj^m{t) = \/rnt,j--L ±At), that 

' m ’ m J 


PR] < 


m 

E 

1=1 


P 


Jo 


If ini At)dMn{t) 

' 777, ’ J 


Wi 

>fl 


SO we need to control the tails of 

Uj = [ ^.{t)dMn{t), 

Jo ^ m ’ 1 

which is the goal of the next proposition. 

Proposition 1. For any numerical constants > 1, £ > 0 and cq > 0 such that ccq > 
2(4/3 + £)ch, the following holds for any 2 : > 0 : 


P 


I Uj I ^ ^1 ,£ 


Z T hn,z,j fV I 2^ T 1 T hji^z,j 


n 


-Uj + C3,e 


n 


< ce 


where 
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j. 1 1 2enVj+ 2e{^ + e)z \ 

hn,z,j = Ch log log ---— ;4 V e , 

yeco{z + 1 ) — 2(2 + e)ch j 

ci,£ = 2 VI + e, C3,£ = Y'^2max(co,2(l + e)(| + e)) + 5 , and c = 6+4( log(l+e)) ^^^>4 q~ 


The proof of Proposition 1 is given in Appendix A.l. Choosing z = x + logm, it yields 


that 


Er 

i=i 


\Uj\ ^ 


' X + log m + hn,x,j v. I X + log m + hn,x,j + 1 


n 


-Vj + C3^£- 


n 


< (6 + 4(log(l+£))-"'^ 

g>i 


where 


hn,x,j = Cftloglog 


2enVj + 2e(| + e)(x + logm) 
eco(x + logm + 1 ) - 2 (| + e)ch 


Ve . 


Then, the choice of data-driven weights is given by 


m{x + logm +hn,xj)Vj , y/m{x + 1+ logm + hn,x,j) 

Wj = ci\ - ——- +C2- 


n 


n 


where ci = 2ci^£ and C 2 = 2 c 3 ^£ gives < ce Finally, to get the numerical constants 

in Theorem 1, we set e = l,c/i = 2, and cq = 28/3e in Proposition 1. □ 

7.2 Proof of Corollary 1 

We denote by Ao,m the projection of Aq onto Am, that is Ao,m = “ -^olP- 

Using Pythagoras’ theorem, we have 

||A — AqIP < ||Ao,m — AqIP -|- ||A — Ao,m|P- 
By the proof of Theorem 1, we obtain 

||A — Ao,m||^ < 2||/3o,m||TV,ji 

< 2||/3o,m||TV max wj. 

l<j<m 

Now, the following approximation lemma comes in handy for the control of the bias term. 
Lemma 1. Given Assumption 1, we have 


||Ao,m — Aoll < 


where = max |^o,£ - f^oA- 


, ^ 2(L„ - 1)A2 


m 


The proof of Lemma 1 is given in Appendix A.2. 


□ 


Ch ^ 
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7.3 Proof of Theorem 2 

Using Pythagoras’ identity, we obtain the following decomposition 

||A^-Aof = ||A;3-Aof+ ||A^-A^f. 

In view of the fact that {^j,m ^ j = Ij • • ■ j is an orthonormal basis of A^,, we have 

and by the definition of /3, we get 

m m 

11/5 “ i'’^ll2 A ^ ~ Pj-l,m\ ^ l|/5 “ NII2 A ^ ~ Pj—l,m.\- 

1=2 1=2 


Then 

m 

11^ “ / 5 II 2 — ^ 1 ( 1 ^ 1 ."* “ 

1=2 





Assume that jS belongs to a set of dimension at most L mav Let S = {j : fij^rn / 
Pj-i,m for j = 2,..., m|, be the support of the discrete gradient of [3. Using the Cauchy-Schwarz 
inequality, we have 


'y y ^l(j/5l,m /5j—l,m| l/5l,m 

1=2 

— ^ ^ folfl/5l,m /5l,m| T |/5j—l,m 


< 


leSus 

E ■ 




j&SUS 

s E 


W4 


^ ^ fol ( |/5j—l,m l3j—l,r. 

jesus 

'] ( \(dj,m — /3j,m 


Iesusu(su5+1) 

< ^|5USU(5 + 1)U(5 + 1)| 

Pj,m Pj,in 


le5USU(S+l)U(5+l) 


X 


max 


W4 


2 leSu5U(5+l)U(5+l) 


< \/2v^Lmax + 2(Lo - 1) 11/3 - /3|| max re,-. 

l=l,...,m 


Hence 


|/5 - / 5 II 2 < V2v^Lmax + 2(Lo - 1) ||/3 - /3|| max wj 


r*l ^ 


+ 2 P-/SII 2 / E 


1=2 




dMnit). 


Now, define the functional G for all Xp G Am in the following way: 
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Therefore, we obtain 
|2 


/3 - / 3 II 2 < V2y/Ljnax + 2(Lo - 1) ||/3 - P\\^ jnax wj + 2||/3 - /3||2G'(/3 - /3) 


Let 


r = U = 0 U 

L=1 L=1 Jc{l,...,m-1},\J\=L 

where {Vl ■ L = I,, Lmax} is the collection of the spaces to which /3 may belong and 
Vl^j denotes a space of dimension L containing signals with a support J. 


It follows that, 


1/3 ~ /SlL < V^-\/Tmax + 2(Lo ~ 1) , max rcj + 2 sup G'(A). 


(29) 


Then by Proposition 4 in [16], we have for any z > 0 
P 

where k = 11.8. Then 


sup G*(A) > K 
AeVi,j,||A||=l 


{L T .z) ^ 2^yrn(^L + z) 


n 


y/L‘. 


n 


< e 




E 


P 


L —l,...,Z/rnax 

1}, \ J\=L 


sup G(A) > K 
AeVt,j,||A||=l 


)(T 3” '^) _j_ 2-y/77i(T + z) 


re 




re 


< 


E 


L —l}.**?-^max 


Choosing z = x + Lmax log m for X > 0, leads to 


E 


P 


L — l,...,Z/niax 

\J\=L 


< L p~^ 


sup G(A) > re 
AeVt,j,||A||=l 


)(L + X + Lmaxlogrre) 


re 


+ 


2y/m{L + X + Lmax log m) 


VL'. 


re 


1 — 

^max'^ 


Plugging this in inequality (29), we obtain for any x > 0 and with probability larger than 
11/3 - /JlL < \/2\/Lmax + 2(Lo - 1) max wj 


+ 2k 


+ 4k 


) (x + T 

max (1 + logm)) 


re 

^/m{x + Lmax(l + logm)) 


re 


and the result follows by using the inequality (a + 6+c)^ < 3(a^ + 6^+c^), for all a,b,c £ M. 

□ 
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8 Proof of Theorem 3 

Let us give first the overall structure of the proof, which is inspired from [21]. In this 
proof, we repeatedly use the KKT optimality conditions of the optimization problem (25), 
given by Lemma 2 below. We use also repeatedly deviation arguments of the data-driven 
weights Wj and a control of the martingale noise, which are provided by Lemma 3 be¬ 
low. We prove consistency of which is an estimator of the right-hand side 

boundary ^ of the interval m = ^1) by showing that —)• 0 as n —)■ oo, 

where := {\ji — j(\ > for all £ G {l,...,Lo ~ !}• We treat separately two 

cases depending on the positions of ji and j^,. In Case I, we consider < j^, see 
Section 8.1 and Figure 6. In Case 11., we consider ji > ji,, see Appendix B and Fig¬ 
ure 7. We decompose even further, using the quantity Aj^min (see Section 4), defining 
the set Cn = { maxi<£<i;,Q_i jj^ — ji\ < We prove that n Cn] —)• 0 and 

—7- 0 as n —7- oo for Case I in Sections 8.1.1, 8.1.2, and for Case II in Appen¬ 
dices B.l, B.2. 


'Tq, 1 T0,l+l 














( ^ > 



i ^ ^J i —^ 

Iji,m Iji+i,m 


Figure 6: Case I. ji < ji 


'To^e-i To,e 













h 




Iji,m Ije+i,m 


Figure 7: Case II. jt > je 


Lemma 2. Consider the total-variation penalized problems in (21) and{25). Let (d = 
and fi = [Aj,m]i<j<m denote the respeetive solutions. Then, the latter vec¬ 
tors and the approximate change-points sequence estimators satisfy for all 

r = 1,...,|5|, 


/3o,i,m - Y + VmY Mn{Ij,m) = 

j=jr j=jr j—jr 

and for all j G {1,..., m}. 


m 

Yj 

q=j 


m m 

Yj Yj ^ri{Iq,m) 

q=j q=3 


< Wj, 


(30) 


(31) 


using the convention sign(/l-.^ ^) = -|-1, if ^ otherwise. The vectors (d and 

/3o,m = [/3o,j,m]i<j<m have the following additional properties 


^q,m = Y-Lm' + 1 < 9 < jr, for r = 1, . . . , L, 

/3o,q,m = Po,jt-l,ra, if jt-1 + 1 < 9 “ 1, /or £ = 1, . . . , Lq - 1. 


(32) 


The proof of Lemma 2 is given in Appendix A.3. Let us now state a lemma which 
allows us to control the martingale noise term. 

Lemma 3. Given two integers a and b, such that 1 < a < b < m, let Mn{a;b) := 
Yl\=a^n{Iq,m)-Then, for all z > 0 we have 


P 


\Mn{a;b)\ > z 


< 2 exp — 


nz 


2/] 


1. g-l 
'' m ’ 


Xo{t)dt d- ^z 


(33) 
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and for all f > 0, the data driven weight Wa satisfies 


P 


.2 ^ mlogm 
w„ > -I f - 


n 


\o{t)dt 


/I 2 ^ 

'' m ’ J 


< 2 exp — 




^ -^1,0^ Ao(t)(it + 3^ 

\ m ’-^-1 


2d: r 


(34) 


where fj Ao(t)dt = E[iV(I)] for any I C [0,1]. 


The proof of Lemma 3 is given in Appendix A.4. Let us now prove Theorem 3. Recall 
that the sequence (Sn)n satisfies me„ > 6, for all n > 1 . An application of the triangle 
inequality entails that, 


p 

max \Tof-fe\>en 

< P 

1 Je, 

max \Toi -> TT 

+ P 

\Je 1 ^ 

max-R > TT 


ii<e<LQ-i ’ 


li<e<LQ-i m 2 


li<e<Lo-i m 2J 


Moreover, the true change-point tq^£ verifies (14) which implies that 


P 


max \To/-f£\>en 
i<e<Lo-i ’ 


< P 


I ■ - .mer, 

max \ji — jA > —— 
l<l<Lo-l 2 


Due to 


P 


r- . I men 
max \ji — jel > 
i<e<Lo-i' ' 2 


Lq — I 


< 


> 


mEr. 


t=l 


2 


it suffices to prove that for all = 1,..., Lq — 1, P[A„^£] —)■ 0, as n tending to infinity. 

8.1 Case I 

Due to the fact that men > 6 for all n > 1, it follows that the event {j£ < ji — 2} a.s. 

8.1.1 Step I.l. Prove: P[A„^£ n Cn] —)• 0, as n —>• oo. 

By the definition of Cn-, we have 

jn-i < je < j>+i, for all ^ = 1,..., To - 1- 
Applying (31) in Lemma 2 with j = ji and j = ji + 1, we obtain 


(35) 


-{Wj, + < X] ^9 “ ^ + 

g = je+i <?=!«+1 

Put Wa^b ■= Wa + Wb, for any two integers a and b. Thus 

A-i 


w 


je+l- 


E A. q,m Pq.,m + ^/mMn{Iq ,m) 


^ ^k+i,k- 


Using the property of the vector fi in Lemma 2, we get 

{je. - je - m) + VmMnije + l]ji - 1) 


^ ^k+i,k- 
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Therefore, on Cn H {j^ < — 2}, we have 

{ji — je — — /3oj>+i-i,m) 


T {ji ji 2)(/3oj'^_,_j—l,m /^0,j£ —l,m) 

+ ^/mMniji + l]ji - 1 ) 




Defining the event 

^n,l 


{ji ji ‘^){ijj^^^-i^rn /^0,j^+i-l,m) 

T {ji ~ ji ~ 2) (/3o—l,m ~ /3o,j£ —l,m) 
+ y/mMn{ji + 1; - 1) + 


- '^jl+hjl (’ 


We observe that Cn/ occurs with probability one. In addition, we remark that for all 
n > 1, m£n > 6 entails — 2 > . Then 

men-] _ f,-. . ^ /r- • 

? “ 2| > ^-2j- C ||j£ - - 2| > —^1 

Therefore 

T[An/ n c„ n c„,^] 


{lit - jfl > c ||i£ 


< p 


w 


'je+i,je ^ I/^0,J>+1-I,m Po,ji-l,r 


\ji-ji - 21 


■| n {je < je - 2} 


+ p 


+ p 


\Q O I \ J 

^ t n On 


VmMn{je + l;jt - 1) 


'-o.> 

1 

1 

to 

3 j 


:— P[^n,t,l] + ^[^n/, 2 ] + P[^n,t,3]- 

Moreover, we have 


< IP 

< p 

< p 


_^je+i,je — ig 

mentis,min 


> 


menA/s^j 


rs.+i 2 36 

771 ^/\^ -\ 


- 362 


> 


nmei/^A 


By (16) in Assumption 4, and (34) in Lemma 3 with ^ 1])] 4^ 

follows that 


]P[-4n,t,i] < 2exp^- 


<2 


2E 


iVn(f^,l 


+ ie 


0 , 




































22 


as n —)> oo. Next, consider the event 

^/mMn{ji + l;je - 1) 




n,£,3 — 


c 


J£ -J£-2 

Mn{je + ~ 1) 

Mn{je + 1; - 1) 

u 

<?—r<-i+2 


> 


l/3oj>+i-l,m f^0,je-l,r 


> 


> 


je - ji-"^ 

^£n^/3,min 

18^/m 


3y/m 

n-3 

n U = 4 


_ II ) \ iC/T £ ■ ™£n^/3,m 


Put ^pn = • By (33) in Lemma 3, we have 


2 

P[^nA3] < 2 y~] exp 

q=j(_-\+2 


mpr 


2E 


A^n 


g-1 


m ’ m 


_L 2,^ 

+ 


< 2(j> - J >_1 - 3)exp - 




2E 




je-i+£ ji—1 
m ’ m 


_L 2,„ 

+ 


< 2 exp — 


np>l 


2E 


iV„ 


J^-1 + 1 jg —1 
m ’ m 


I 2,^ 
+ 3 


+ log m . 


By (16) in Assumption 4 , it implies that P[A„^£^ 3 ] goes to zero as n —?• oo. We now control 
lP[^n,r, 2 ]- Using Lemma 2 with j = (AiA+il a,nd with j = + 1, and using the triangle 

inequality, it follows that 


rA±|i+i]_i 

y~! Ng - ^ /3g^m 

g=h+i g=h+i 




Furthermore, on the event Cn H {jg < jg — 2}, the following inequalities 

^ ^ rjr + jr+i-i 1 ^ • 1 

Je < J£ < q < \ -- I - 1 < J£+i - 1 , 


hold true. Moreover, we note that ^ A jg < q < — 1 < jr+i — 1. 

Consequently, we have 


fcl - it - ^ + 1. _ 1) 




which implies that 

0^+1 - - 2) . + 


VmMnijg + 1 ; - 1 ) 
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Therefore, we may upper bound P[74n £ 2 ] as follows 


= P 

= P 

< P 


T|3 q I ^ „ 


> {k+i - k - 2 ) 




+ |\/mMn(j£ + 1; ^2^^ ^ ~ 


> {k+i - k - 2 ) 






nc„ 


< P 
+ P 

< p 


j- j^+jv+i 1 — (i-^+i 2) 


|/^0j>+i-l,m /3o,j>-l,r 

12 


/— i~/r !' II vk~^ J^+1-1 i\ ^ • r)\ —1,' 

kmMniji + 1; r-T-1 - 1) > [je+i -j£-2) - 


12 


+ P 


(^_7,min 2)Ap 

,ri 


12 


Mnik + 1 ; - 1 ) 


> 


( Aji'^min 2)A;3 ,r 

12 x/rn 


On the other hand, it is easy to see that (13) in Assumption 3 yields that Aj^min — 2 > 


_ 2 > Az^. Thus 


— 6 
^[kn,i,2] < IP 


riz+i,r^^i - 


+ p 


Ajf^min A^ ji 

72 

k + k+i - 


Mkk + k r ' 1-1) 


> 


Aj^minA^^njin 

72-y/m 


“n,£,2 “n/,2- 

Using the property of the data-driven weights, we remark that 

A2 . . 1 


“n,€,2 — ^ 


-2 “7,min'^/3,min 

,”’*+■ - —ni5— 


By (17) in Assumption 4, and (34) in Lemma 3 with ^ 
it follows that 


- +E[A^((^,1])], 


«il 2 A 2exp - 




2E 






0 , 


as n —)• cx). Similarly, using (17) in Assumption 4, and (33) in Lemma 3 with z = 
^'’ 72 ”^’"“' ’ implies that 


a. 


( 2 ) 

n,£,2 


< 2 exp — 


nz 


2E 


r H+H+l n -, 

(^,J- 2 -^ 

\ \ m ’ m 


0, 


+ 


as n —)> 00 . Therefore, we conclude that P[A„^^^ 2 ] —)• 0, as n —>■ 00 . 
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8.1.2 Step 1.2. Prove: V[An^(,r\ —>-0, as 


n —>■ oo. 


Recall that = { max \ji — j(\ > We split P[^n,£ H C„] in three terms as 

^ l<A;<Lo —1 

following 

¥[An,e n = F[An,i n n + F[An,e n P>W], 


where 


:= {there existsZ G {1,... ,Lo - 1} : J> < J>-i} n 
;= |for allZ G {1,..., Lo - 1} : Jr-i < je < jr+i} ^ C^, 
:= (there existsZ G (1,..., Lo - 1} : jr > 3t+i] n C^. 
Let us first focus on P[^n,£ H Observe that 


P[zl„,, n ZI^)] = p n (ir+i - h > n zzi™) 

+ P[zl„,, n (ir+i - jr < n zzi-)]. 


The fact that 0 < jZ+i — yields jZ+i — j'^+i > Then, it is easy to see that 

je+i - ji+i = ih+i - je) - Oe+i - je) > Aj,min - 

Hence 

P[^„,r n Zl(-)] < P n {je+i - je > n 


+ P 


Ar.,en{je+i-je+i>^f^}nDlr^ 


Moreover, we note that 




m) 


Lq—2 


c 


U {> 


- jr > 


A 


‘j,min 


r=^-\-l 


) ^ 


jr+l jr A 


A 


■j,min 


}nzi^ 


m) 


Thus, we have 


Lq —2 

F[An,e n ZZ(r)] < F[An,e n Be+i,e n P[C,,, n Bs+i,s n 

s=e+i 


(36) 


where 

' Bp,, = {Cjp-jq)>^}, 

< with the convention = {m — jio-i > ^ 2 ™ I; 

. Cp,g = {{jp-j,)>^}. 

Let us now prove that the first term in the right hand side of (36) goes to zero as n tends 
to infinity , the arguments for the other terms being similar. Using (31) in Lemma 2 with 
j = je and j = je + 1, on the one hand and (31) in Lemma 2 with j = je + 1 and j = je+i 
on the other hand, we obtain, respectively 

\h - ji - + {V^MnCje + 1; “ 1)|, 


(37) 
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and 


\k+i - k - + \VmMn{ji + l;]e+i - 1)1- 

Besides, we have 



7 

o' 

1 

= 


< 

1/3'- - 



< 



3i+i 


\ji-je - 21 


+ 


+ 


kmMnjji + l]ji - 1)1 
\je - J> - 2| 

%+i,i<+i , WrnMnUe + kJe+i - 1)| 


+ 


\je+i - k - 2| \k+i - k - 2| 


< , VmMnije + l;je - 1)| 

— men ' 

6 


\ji-je - 21 


, %+i,i^+i , I VmMn(j£ + l;j£+i - 1)1 

' A . . “T 


\je+i - k - 2| 


Define the event En/ by 

^n,(. — / |/5oj^+l —l,m (3oji — l,m\ k 


“T A . 


me-n 

6 


+ 


+ 


y/rnMn{ji + l]ji> - 1) 


jt - J> - 2 
kmMniji + kji+i - 1 ) 


k+i - if - 2 

We observe that occurs with probability one. Therefore, we obtain 

n n Di™)] 

< P \En/ n {{ji - je) > n {{ji+i - j 

—l,m Po,j£ — l,rr, 


< P 
+ P 
+ P 






24 


—l,m Po,j( — l,r 


24 


kmMnUe + 1; if - 1) 


> 


k - if - 2 


n{i 


if - if - 2 > 


me,i 


+ P 


y/mMn{je + l;if+i - 1) 


> 


if+i - if - 2 


n{i 


if+1 - if - 2 > 


A 




6 


(38) 


^n,f,l + ^n,f,2 + ^n,f,3 + ^n,f,4 
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We note 


7n,£,l ^ 


< P 


mentis y 


< P 


™2 2 a 2 

^2 ^ ^ ^n^B,min 

> -J^2- 


TXTTXS^ 

Using (16) in Assumption 4, and (34) in Lemma 3 with ^ = 4 g^\ogm^ 
we get 


N„ 


31-1 
n\ \ m, ! 


dn,i,i ^ 2 exp - 




2E 






0 , 


as n —)■ oo. Analogously, 




‘Ajjmin^/3,min 


rs.+i 2 48 


< p 


A2 a2 

-2 ^ 7,min 5,min 

ri.+i ^ — 

nA? Ao 


Using (17) in Assumption 4, and (34) in Lemma 3, with ^ ‘ 48C"iogm" + ^[^^((m> ^D]’ 

we have 


^n ,£,2 < 2 exp - 




2E 






0, 


as n —?• oo. Furthermore, using (33) in Lemma 3, we have 

^£n^/3,mir 


0n,e,3 < P 


[ji + i;j> - 1 


> 


24-y/m 


< 2 exp — 


nipl 


2E 




m ’ m 


+ 


+ log m , 


where V'n = ^"^^" 24 ^’"“" • (1®) ™ Assumption 4, we get that 6n,e,3 0, as n —>■ 00 . 

Similarly, using (33) in Lemma 3, we have 


On,£,4: < U 


|-^n(j> + 1; jf+1 - 1)| > 


^ j, min “A ^ ^ Jilin 


24\/m 


< 2 exp — 


n5l 


2E 


at (/k ii±^ 

\ \ m ’ m 


+ 


+ log m , 


where (5„ = By (17) in Assumption 4, we get that 0n,£,4 —)• 0, as n —)• 00 . 

Consequently, we obtain P[Au_£ n n —)■ 0 as n —>• 00 . Now, we have P[A„^^ n 

Dn^^^]<P[Dn%and 


P[IlJ')] = p[[3ie{i,...,Lo-i}:j£<j£-i}ncl 


= p 


Lq — I 


[J max{l < q < Lq - 1 : jq < jg_i} = £ I n Cj 


£=1 


Lq — I 


^ P |max{l < q < Lo - 1 ■■ jq < jq-i} = ^| n 


£=1 
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We note that on the event {max{l < q < Lq — I] jq < jq-i} = i}, it is clear to see that 
je < ji-i and jq+i > jq for all (7 = Lq — 1. Then, it follows that 


Lq — 1 


£=1 


Lq — l 

2 ^-ip [ f] {je < j,_i} n {j,+i > jq} 
q>e 


In addition, we note that 


Lq —1 

n < je-l} n {jq+l > jq} 

q>e 

c u, <j,]n({],» > u {i,,. < 


n I {i,« > u {j,« < 


n.... n u + 

n ({3 l, > u 

_ r . ^ . ^i,min -1 „ / r ^ ^7,min , r • ~ . ^7,min 

c [je -31> —} n -3e> —} u {je+i - je+i > — 

^ 7 ,min-, , , r • ^ ^ ^ 7 ,min 

u {je +2 - 3e+2 > 


n |j £+2 - Je+i > 


n . . . . n ( {jLo-l - 3 Lq-2 > } U {jLo-2 - jLo-2 > 


j,min 


( {jLo “ 3 Lo -1 > -^^} U {jLo -1 - 3 Lo-l > 


A 


j,mm 


Lq—2 


c 


U(( 


q=t 


~ ^ ^7,min 7 „ r "• • ^ '^7,min i \ , r ■ "• ^ ^7,min 7 

3 q- 3 q> n Vq +1 “ ^ ^Q-l “ 3 Lo -1 > 


Hence 


Lo- 2 Lo- 2 . . 

P[D„''>| < 2"»-= p[{(i, - 3,) > ^} n {3,+. - 7 , > ^} 

£=1 q=e 


( 39 ) 


+ 2^““2p 


r . Aj'^IIlin 

ijLo -1 - jLo -1 > -^- 1 
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Consider the first term of the sum in the right-hand side of (39). Using (37) and (38) with 
i = q, we obtain 


P 




A 


Jq Jq ^ 


j,nim 


}" {i’ 


Jq+1 Jq > 


A 


j,min 


< P 


W 


’~jq + i,jq ^ l/^0j5 + l-l,m P 0 ,jq-l,r. 


+ P 

+ P 


W 


+ + l ^ l/^0j5 + i-l,m Pojq-l,' 


VmMnUq + l-,jq-l) 


jq jq 2 


> 




n{i 


Jq jq ^ 




+ P 


VmMnijq + l]jq+l “ 1) 


jq+1 jq 2 


> 


\fjo,jg+i-l,m Ijojq-l,'. 


n{i 


Jq+1 jq ^ 




•— ^n,q,l Qn,q,2 ^n,q,3 U (^n,q,A- 

By (33)-(34) in Lemma 3, and (16)-(17) in Assumption 4, we show that for s = 1,..., 4, 0n,<j,s 
0, as n tending to infinity. Then 


P 


^ ^7,min I f ^ ^7,min f 

Jq — Jq > 2 / I 1 ~ ^ J 


0 . 


Let us now consider the last term in the right hand of (39). Using the observations (37) 
and (38) with i = Lq — 1 leads to 

A,- 


P 


{i 


jLo-l - jLo-1 > 




< p 


w 


'Jlq-i+IJlq-i ^ fjo,jLQ-i-l,r 


mSn 

6 


+ p 

+ p 


w 


Jlq- 

A. 


-1+1,m ^ |/3ojyg-l,m /5ojip_i-l,! 


y/mMn{jLq-l + jLq-l “ 1) 


j+o-l - j+o-l - 2 


> 


n{^ 


jLo-l “ jLo-1 > 


A.i rrii 


j,mm 


[f 

y/rnMn{jLo-i + l-,m-l) 


[1 

m - j+o-i - 2 

iJ 


+ p 


•— ^n,Lo — 1,1 T ^n,Lo — 1,2 T ^n,Lo — 1,3 U ^n,Lo — 1 , 4 - 

By (33)-(34) in Lemma 3, and (16)-(17) in Assumption 4, we show that for s = 1,... ,4, 
we obtain 6n^Lo-i,s —)• 0, as n —)• oo. Then 

Aj min 1 ( A 


p 




jLo-1 - jLo-1 > 


}n{ 


m- jLo-i > 


j.,mm 


0 . 
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This implies that —>■ 0, as n —)• oo. Similarly, we prove that —>■ 0, as 

n —)• oo which yields that P[^n,£ H C^] —)■ 0, as n —)■ oo. This concludes the proof of 
Theorem 3, up to the case {j^ > ji} for a fixed £ G {1,... ,Lo — 1} which is given in 
Appendix B. 


9 Proof of Theorem 4 

This proof is based on the same arguments in the proof of Theorem 3. Let = 

{m’ • ■ ■ ’ approximate change-points. First, we note that 


P 


£{T\\To) > e. 


< P 


£{T\\To 


•approx \ 


> Sri 


+ P 


T(ro"pp™")||ro) > 


Obviously, since men > 6, we have P T(7^‘^'’^™^)||7c 
the inequality L < m holds true. In order to prove 

p[{T(r||VPP™") > e„} {l > Lo - 1 

as n —)■ oo, it is enough to prove that 


= 0. It is 


clear to remark that 


^}] 


■ 0 , 


P 




as n 
P 


■ oo. We have that 


7-”") > e„} fl {io - 1 < i < ”}' 


■ 0 , 


[{sifK'””-") > e„} n {io - 1 < r < ™} 

< P[{«(r||T;»»'“) >£„}n{lL.L.-i}] +p[{f(t|| 7 ?'«"”) > e„}n{li>L.-i}’ 

m 

■{£(ri|T;»»'“) > £„} n + E ^ 

L=Lr 


< P 

< P 


{£(ri|T;»»'“)>£„}n{iL.L.-i}] + 

m Lq— 1 o . 

+ Z E lP[VgG{I,...,L},|^-^| 

L mm 

L=Lq i=\ 


p 


T(t|| > En 


(40) 




L=Lq i=\ 

The first term of the right-hand side of (40) tends to zero as n tends to infinity since it 
is upper bounded by P \j^ — j^| > mSn which tends to zero by the proof 

Theorem 3. Let us now focus on the second term on the right-hand side of (40). Note 

m Lq — 1 

E E 

L=Lo i=\ 


of 
that 


m Lq — 1 

E Er 

L=Lo i=l 


VI < g < L, \jq -je\ > me, 




[-Rn,r, 2 ] + P [.Rn/,3] ) 


where 


Rn /,1 := |v 1 < g < L : \jq- j(\ > men and jq < 

Rn /,2 := |v 1 < g < L : \jq- j(\ > men and jq > j>| 

-Rn,£,3 := (3 1 < g < L - 1 : {\jq- j(\ > men}, {\jq+i - je\ > men}, and {jq < je < jg+i 
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Note that 


]P[-Rn/,l] = P Rn/,1 n {jL > jt-l\ + IP -Rn/,1 H [jL < jt-l\ 


By applying (31) in Lemma 2 with j = ji and with j = jl +1 in the case where jl > ji-i^ 
it follows that, with probability one, 

{ji ~ jL — 2) ((/3o,j^-l,m ~ /3oj>+i-l,m) 

+ y/mMn{jL + 1; - 1) 




Thus 


P 


Rn,t,l n [jL > jl-l] 


< p 
+ p 
+ p 


w 


'jL+l,jt y l/^0j(.+i-l,m /3o,je-l,r. 


me„ — 2 


\^jL+i-l,m ljo,ji+i-l,m\ > 


-} n {ji > ji-i} 




{ 




:=iP[<li]+iP[<li]+iP[<h]' 


ji - jL-‘^ 
?( 2 ) 


3y/m 


} 

■}n{b£ 


-jil > mer. 


(3) 


Using (16) in Assumption 4, and (33)-(34) in Lemma 3 with ^ +P [Nn ((, 1])], 

we prove that Y1T=Lo ^ ^] —)• 0, as n —)■ oo. Let us now consider to P 2 ] • 

Using (31) in Lemma 2 with j = ji + 1 and with j = ji+i, we get 

{ji+l - ji - ‘^)\Pj^+,-l,m - /3oj,+i-l,m| < Wj,+l,j,+^ + \^/mMn{ji + l;j£+l “ 1)|. 
Therefore, we may upper bound 1P[-R^^£2] follows: 


P 


1/3: 


A+i-i,™ 


— /3o,, 




l/3o.. 




1^0, ji- 




< P 


j,+i > {ji+l - ji - 2) 


l/3o.. 






,jt-l,m\ 


6 


+ P 


Mn{ji + ji+l — ^) ^ \fjo,jt+i-l,m (jo,ji-l,r. 


ji+l — ji — 


6y/m 


By using Lemma 2, and (16)-(17) in Assumption 4, we conclude that Y1T=Lo Yldh ^ R{R^n\ 1 ) 
0, asn — >■ 00 . Analogously, it can be shown that Y1T=Lo Ylih^ ^\Rn,£,l<^{jL < j£-i} \ 

0, asn —>■ 00 . Moreover, we prove, similarly, that Y1T=Lo Yldh ^ P [Rn,i, 2 ] 0, as n 00 . 

Let us now focus on X]l=Lo P [Rn,i,^]. Note that P [Rn^i^s] can be split in four terms 


as follows: 


p[fi„,,,s]=p[fl«3 ]+p[-rs,3]+p[rA3]+ip[A‘:;.3]. 


( 2 ) 


?(3) 


?(4) 
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■} 


where 

^n,i,3 -^n/,3 1 jq ^ jq+1 ^ ji+1 

^n,£,3 ‘‘~ ^n,£,3 1 jq < J^+1) Jg+1 — ji+l 

^n},3 •= Rn,£,3 H |jq < ji-i,j£-i < jq+l < j^+1 j 

^n,£,3 ‘‘~ ^n,£,3 ^jq — j£—l^j£+l — Jg+lj’- 

We have to use Lemma 2 twice. For we first use (31) in Lemma 2 with j = 

and j = jq + 1, respectively, which gives with probability one 


Thus, 


{j£ - Jq - 2) {Po,h-l,m - + V^MnQq + 1; “ 1) 


< W 


ig+ljl' 


(41) 


’k + hje ^ I ^Mn{jq + 1; j£ - 1) 


w 

j£ - jq- 2 


ji - jq-‘^ 

Second, we use (31) in Lemma 2 with j = j£ + I and j = jq+i, respectively, to get with 
probability one 

{jq+l ~ j£ ~ 2 ) { (/3oj^_,_i-l,m ~ V^^n{j£ + 1 j jg+l ~ 1 ) 

Hence 

|/5o,j£+i-l,m - 
Define the event 


^ %+i,5,+i- 


^ . ^Mn{j£ + l-,jq+l - 1 ) 


jq+1 - j£-‘2‘ 




W 


jq + ljt 


\j£ - jq - 2 | 


+ 


jq+1 - J> - 2 

y/rnMn{jq + l-,j£ - 1) 


j£ jq 


W 


+ 


' • I 1 “ 

J^ + 1 Jq+1 


\jq+l - j£- ‘2' 


+ 


VrnMnUe + l]jq+l - 1) 




W 


jq+lJe 
m£„ — 2 


+ 


jq+1 - j£- 2 
y/mMn{jq + l;j> - 1) 


mSri — 2 


w. 


V ■ I 1 • 


y/mMn{j£ + 1; jg +1 - 1) 
me„ — 2 


ip[<bi 


mSn — 2 

We observe that the event 3 occurs with probability one, so 

< P 
+ P 
+ P 
+ P 


r%+ij. ^ 

Po,je+i-l,m ljo,ji-l,m\' 

-men — 2 

4 J 


^i«+ljq+i ^ |/5oj>+i-l,m Poji-l,' 


m£n — 2 4 

ymMnijq + l',j£ — 1) \ljo,jtj_^-l,m ~ /^Oj>-l,r 


m£r, — 2 


VinMn{j£ + 1; jg+1 — 1) ^ \/jo,je+i-l,m Po,ji-l,r 

4 


m£n — 2 
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Using Lemmas 2 and 3, and (16)-(17) from Assumption 4, each term of the last inequality 
goes to zero, as n —)■ oo. For we apply Lemma 2 with j = ji and j = jq + \ to 

obtain (41) and then with j = j^ + 1 and j = to get 


{k+i - k - 2 ) {} + VkiMnUe + 1 ; je+i - 1 ) 


— jl+i ■ 


( 2 ) 

It follows that event ^ 3 occurs with probability one, where 




w~- 


jq+^,je 


\je - jg - 2 | 


+ 


kmMnijq + l-,je - 1 ) 


Jl Jq 


\je+i - k - 2| 


VrnMnUe + kje+i - 1 ) 


w 




^ { |/^0j£+i-l,m /3o,ife-l,m| < _2 + 


je +1 - ji-‘^ 

y/rnMnjjq + kje - 1) 
mSn — 2 


, '^h+ijl+1 I 

A • ■ — 2 


yfmMkji + 1; jq+i - 1) 


^j,min 2 


Then 




( 2 ) r^^( 2 ) 


< P 




mEr, — 2 


+p 

+p 

+p 




A • ■ — 2 


\/kiMn{jq + 1; — 1) ^ 


men — 2 




A • ■ — 2 


Using Lemmas 2 and 3, (16)-(17) in Assumption 4, each term of the last inequality tends 

j'o'v 

to zero as n —>• + 00 . For P[i 7 )j^ 3 ], we first use Lemma 2 with j = ji^i + 1 and j = to 
get 


[k - k-i - 2 ) (/3oj^-i,m - + VmMn{je-i + l;je - 1 ) 

And then with j = ji + 1 and j = jq+i, to obtain 

(jg+l “ ji ~ 2 ) (/3ojf+i-l,m — VkMniJe + ^'ijq+l ~ 1) 




< re 


i«+iiA+i ■ 
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(3) 

Hence the event ^ 3 occurs with probability one, where 


^ni,3 ~ { \Po,je+i-l,m (3o,ji-l,m\ < , . _ . _ ol 




\je - ji-1 


^/mMnije-i + l;je - 1 ) 


je — ji-i — 2 


w 


c { |/3o„ 




- /3o,7fe- 


\jq+l - - 2| 


VmMnUe + i;iq+i - 1) 


jk-l,m\ < 


A • ■ — 2 


+ 


ig+1 - j£ - 2 
y/m^nUe-i + 1; - 1) 


A ■ ■ — 2 


W 


+ 


+ 1 Jg+l 

me„ — 2 


+ 


y/mMn{je + l;jg+l “ 1) 


me„ — 2 


Then 




)(3) 


< 


P 


w 


Je- 


1+Ij> ^ Po,je-l,r. 


A ■ ■ — 2 




m£„ — 2 


+P 

+P 

+P 


By Lemmas 2 and 3, and (16)-(17) in Assumption 4, it implies that each term of the last 
inequality tends to zero as n —>■ +oo.. Finally, for P[i?^'^^ 3 ], we first use Lemma 2 with 
j = ji-i + 1 and j = je to obtain 


kmMn{je-i + l-,je 

- 1 ) 



A ■ ■ — 2 

^j,mm ^ 



4 J 

y/mMnije + l;jg+i 

- 1 ) 



m£n — 2 


4 J 


{je - jt-i - 2) {} + VmMnUi-i + l-,je - 1) 
Second, we use Lemma 2 with j = ji + 1 and j = je+i to obtain 

{ji+i - k - 2 ) (/3oj,+i-i.m - , ,_i m) + VmMniji + kji+i - 1 ) 






It follows that the event Q^\ 3 occurs with probability one, where 


Qn/,3 ~ j /3oj£-l,m| < ^_ 2 \^ 




s/rhMn{ji-i + l]je - 1) 


l J>+1 


\ji+i - j£ - 2 | 

c{\P0,n.,-l,rn-P0,,-l,rn\<^^^ 

I ^j^min ^ 


k - k-i - 2 
kmMniji + kk+i - 1 ) 


je+i -k-‘^ 


^/rriMn{ji-i + 1; J> - 1) 


A ■ ■ — 2 


A ■ ■ — 2 


y/rnMn{j£ + kji+i - 1 ) 


A — 2 
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Then 


IPl-RiTl = IPKhnellal 


(4) r.^(4) 


< P 


W 


u 








A ■ ■ — 2 


+p 

+p 

+p 

^ 0 . 

as n —7- oo. This concludes the proof of Theorem 4. 

Appendix A 

Here we prove Proposition 1 and Lemmas 1, 2 and 3 

A.l Proof of Proposition 1 

Fix j G {1,..., m}. We have 

7/, = lw/'‘ 


n /n 
i=l 


1 

Vj = n{Uj) = - 1,2^ ^Js)Xois)d{s), 

n Jq m ’ j 

1 n 

P,=n[P,] = -j; / l(^.(s)dW(s). 




Classical Bernstein deviation inequality applied to Uj, see [34], yields that 


P 


z 1 


fi 


\Uj\>V^+—,- ^Js)Xo{s)d{s) < 

OTI Ti Jq ^ m ’ J 

for all 0 > 0, and z > 0. It follows that 

P 


< 2e- 




< 2e' 


By choosing 6 = cq{z + 1)/ n, this gives 
P 

For any 0 < rj < 9 < oo, we have 
126ViZ z 




n 


< 2e- 


{|V 


Jl — 


r]n 3n 



^ \Po,je+i-l,m /3o,ji-l,m\ 

A ■ ■ — 2 

4 

y/mMn{je + l-,j£+i - 1) 

^ |/3o,jf+l-l,m — /do,ji-l,m\ 

A ■ ■ — 2 

4 


□ 


(A.l) 


(A.2) 


(A.3) 


}n{,<v,<e}c{\u,l>^+±}n{,<V2<0}. 
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Together with (A.2), we obtain 


P 


\Uj\ > 


l2eV4Z z 


< 2e"T 


r]n 3n 

Now we want to replace Vj by the observable Vj in the deviation (A.2). Define Uj by 


(A.4) 


U, = V,-Vj 


1 


n „i 


i=l 40 ™ ’ 

Now writing again (A.4) and using the same argument as before, we arrive at 


P 




l2eViZ z 

ijn 6n . 


< 2e 


(A.5) 


But, if Vj satisfies 


then it satisfies 


and 


\Uj \< 


l2dVjZ z 
ijn 3n’ 


Vj < 2Vj + 2(—H o) 
rj 6 n 


e ,e 1 , 


e\ z 


Vj < 2Vj + 2(- + 2i -(^ —h-) + 2-1 — 

V3 V rj rj 3 rj/ n 

simply using the fact that A < b + y/oA entails A < a + 26 for any a,A,b> 0. This proves 
that 


\Uj\< 


l26V-iZ z 

+ 


rjn 3n 


}-{ 


U, 


< 


i2evz 


rjn 3n. 




(A.6) 


So, using (A.4) and (A.5), we obtain 


P 


ez. 


n 


9,9 




rj^T] 3' J n 


< 4e' 


The inequality is similar to (A.4), where we replaced Vj by the observable Vj. It remains 
to remove the event {?7 < Ij- < 0} from this inequality. First, recall that (A.3) holds, so 
we can work on the event {Vj > co(z 1)/?^} from now on. We use a peeling argument. 
Define, for q> 0: 

0q = co^-^^{l + ey, 


n 


and use the following decomposition into disjoint sets: 

{Vj > ^o} = U ^ 

<?>o 
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We have 


where 


P 


\Uj\ > Ci^e\l +C2,£^,0q < Vj < 0g-^-l 


< 4e" 


ci,e — 2 \/l + £ and C 2 ,e — 21/(1 + e) (-+e) +-. 


Let 


On the event 


we have 


{\Ui\ 


< 


hj = Cfeloglog Ve). 

^0 


2(1 + £)Vj{z + hj) (z + hj) 


n 


3n / 


Vj < 2Vj + 2{^ +£)- + 2-^-^Chloglog (^ V e), 
3 n n 00 


which entails, assuming that eco > 2((1 + e) + ^)ch, 


yj< 


eco{z + 1) 


2V,+2(^+8)^), 


eco{z + 1) - 2(| +e)ch 

where we used the fact that log log z < z/e — 1 for any z > e. This entails, together with 
(A.6), the following embedding: 


{|C:/I 


< 


2(1 + e)(z + hj)Vj ^ z h 
n 3n 




2(1 + £){z + hj)Vj _|_ (-^ T hj) 
n 3n 


{\Ui\ 


C \UA > Cl, 


' z + h 


fr . ^ + hn,z,j \ 

- Vj + C2,£ - 1, 

n n ) 


where 


hn,zj = Ch log log 


2enVj + 2e(| + e)z 
eco(z + 1) - 2(| + e)ch 


V e 


Now, using the previous embeddings together with (A.4) and (A.5) we obtain 

Z hr, 


P 






n 


+ C2,. 


n 


q>0 


2(l+e)V,{^ + hi] ^ ^ ^ 

n 6n 




> 


q>0 


2(1 + e)V,{z + hi) ^ 8, < Vj < »,+, 


n 


3n 


< 


4(e"^ + E ) 


g>i 


= 4(1 + (log(l+ e)) '"'“E^ 

9>1 

Then with (A.3), it implies that 


P 


\Uj\ ^ ci^£ 


z hn^z^j tV . T 1 T h. 


n 


-^7 + C3,, 




n 


< (6 + 4(log(l + e)) 

g>i 


where = ^2 max (cq, 2(1 + e)(| + e)) + |. 


□ 
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A.2 Proof of Lemma 1 

Using the fact that the functions {^j,m ■ j = 1; • • • j form a basis of Am, and under 
Assumption 1, one can give the explicit form of Ao,m as following 

m Lq m 

Ao,m = ^ ^ ^ ^ ^ Po,i\Ji U Ij^m\^Ij^jn ~ 

j=l£=1 j=l 


here |A| is the Lebesgue measure of the set A and f3oj,m = V^J2i=i(^o,e\Je U Ij,m\- We 
remark that the intervals Jg do not share the same boundaries as the smaller intervals 
Ij^m- Setting the sequence {ij)j=o,...^m be the sequence defining by: 

Iq = 1, and ij = max{7 = 1,..., Lq : H Ij^m / 0}, for j = 1,..., m. 


Using the sequence {ij)j=o,...,m, one has the expression of the functions Aq and Ao,m as 
follows: 

m U 

■^0 = ^ ^ 
j=i £=7 ,_i 


and 


m 


Ao,m — na0j,m 1 




i=i e=ei-i 


i=i 


i=i 


e=i 


■j-i 


i ■ r 

where ao,i,m = rn ;5o,£|«/£n4m|- From the fact that : j = 1, • • •, "i and i = 

1,..., Lq} is an orthogonal basis of (with respect to the L^-norm), we obtain 


||Ao Ao,m|| — 


E E (<9 

j=i e=ej_i 


£'=£i- 


j-i 


E E 0 / — m ^ /3o,£' I Jg' n Ij^m I j I -/f n I. 

i=i £=lj-i 


j,m\ 


£'=£i- 


i-1 


E (9 0/— m ^ n lyml') |<7£n/- 


i=i 

m 






i-l 


i=i " 

m 2 

- Z + 1) , max _ (/3 o,£ - PoA \Ij,m\ 


< 


2(Lo - 1)A2^ 


m 


This proves Lemma 1. 


□ 
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A.3 Proof of Lemma 2 


To prove Lemma 2, we invoke subdifferential calculus, see [5]. We first write our objective 
functional as 

^ m m 

j=i i=i 

So a necessary and sufficient condition for a vector jl in to minimize the function is 
that the zero vector in belongs to the sub-differential of <h(/u) at the point /i, that is, 
the following optimality condition holds: 


for all j = 1 ,..., m 


'’’(N - T/i)^ = t()jsign(/lj,m), if Aj.m / 0, 

- Tfj)^ < WjSign{llj^rn), if fij,m = 0. 


Using that (T^N)j = = Yl^=j since T is a m x m lower 

triangular matrix having all its nonzero elements equal to one. Now, for q = 1,..., m, we 
observe that 


Nq = y/m / Xo{t)dt + y/mMn{Iq,m) 


= ^/m {Xo{t) - Xo^rn{t))dt + y/m Xo^mdt + ^/mMn{Iq,m) 

iq,m 

/ Lq m 

q,m 

^Ij,rr.{t)dt + y/rnMn{Iq,m) 


= ym 


IIL n 

3 = 1 

and we get the desired result. 


□ 


A.4 Proof of Lemma 3 


For the first statement, we have by definition, 

b 

Mn{a;b) = I y^^Mn{Iq,m) 


q=a 


n b 


-y^y^ / Iq,m{t)dMi{t) 
^ i=l q=a JO 
1 n 1 

— yy / l/ a-l b_-i{t)dMi[t) 

Tl _ —^ Jq k m. ’ mJ 


Moreover, using Bernstein’s inequality, it follows that, for any z, a > 0, 


P 


n „i 


-y" / 1,^ ±At)dMi{t) 

Tl ^^ Jq ^ m ’mJ 
n 


> 2:, 


1 1 c ^ 

Lgi, ^2exp{-^-^}. 


2a -|- ^pz ■ 
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where p is a upper bound of 


1 -I 

'• m ’ m J 


a = n 


-1 


. Here we can choose p = ^ and 
a — 1 b 


Xo{t)dt = n 


iV„ 


m m 


Hence, we obtain the hrst statement. For the second one, recall that for any a = 2,. 
we have 


/ m{x + logm + hn,x,a)Va , ^/m{x + 1 + logm + hn,x,a) 

Wa = Cl\ -h C 2 -. 

\j n n 

Since each term of Wa is positive and taking in account the dominant one, we have 


n m log m 

Wa > - 

n 




Aq 


m. ’ J 


C K > ^ - 


Xo{t)dt}, 


^( — ,1] 
^ rr?. ’ J 


for all ^ > 0. By the Doob-Meyer decomposition theorem, we get 


„ 2 log m 
w„ > - 


n 


Ao(t)dt)| C {Mn(a;l) >C}, 

^ m. ’^1 


., m 


Finally, by applying the hrst statement, see (33), to Mn{a; 1), we concludes the proof of 
Lemma 3. □ 


Appendix B 

Here we prove the second case of the proof of Theorem 3 which is quite similar to the hrst 
one with a careful choice of the bounded terms in the approximate change-points sequence 
while applying the KKT optimality conditions. As m£n > 6 for all n > 1, it yields that 
the event + 2 < a.s. 


B.l Step II.1. Prove: H —)■ 0, as n —)• cx). 

Applying (31) in Lemma 2 with j = + \ and j = , we get 


q=je+^ q=je 

and 

m m 

q=je q=je 

It follows that 


je-i 

^ ^ Po,q,m T '^f^Xlniylq^xn) 
q=k+^ 


je-i 

E 


|9 


q,m 




< w 




The property of the vector /3 in Lemma 2 yields that 

Ui - je - 2)(^o,i^+i-i,m - + VmMnije + l]je - 1) 
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Therefore, on Cn H {ji > ji + 2} we have 

Ue ~ ji ~ 2)(/3oj^-l,m ~ $j^_i^rn) 

+ y/rnMn{je + l]ji - 1) 

T (.jt ~ ji ~ 2) (/3o,j^+i —l,m ~ /3o,jf —l,m) 

Define the event 




C'n,i = 


{ji ji 2)(/?0j>-l,m fjj^_i^rn) 

+ ^/mMn{ji + l;ji - 1) 

T (ji ~ ji ~ 2) (/3o—l,m ~ /3o,je — l,m) 


^ %+i,3. 


It follows that occurs with probability one. We observe that, men > 6 for all n. 


entails that — 2 > . Then 


{\ji-Ji\ > ^} C {\h-ji - 2| > ^ - 2} c {\je-ji - 2| > 


mSr, 


rnSr. 


Therefore 


= p 


{ji - ji - 2)(/3o,i^-l,m - + VmMniji + I'Jl - 1) 


T {ji ji 2) —l,m /jo,je—l,m) 


^ %+l 


Jl} 


n An/ n Cn n {ji > ji + 2 } 


< P 


w 




> 

men — 
6 


'} ^ 


+ P 
+ P 


fio Q 1^ l/3oj^+i-l,m /^0,i£-l,mh „ 

P0,i«-l,m| > 2 J n On 

V^AIniji'-i ji — ^) |Poj>+i-l,m ~ Po,j>-l,m 


- ji-‘^ 

■■=nK/,i\+nK/,2\+nK/,^]- 

We have 

1PK,£,i] < P 

< P 

= P 




me„A 


n^/5,min 




18 

^£n^/3,min 


36 


^2 2^2 

2 ^ ^ ^n^B,min 

“’«+! 2-- 


By (16) in Assumption 4, and (34) in Lemma 3 with ^ + lE[A'n((^, 1])], it 

follows that 


^{K/,i) < 2exp - 




2E 


iV „( (^^,1 




0 , 
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as n —)> oo. Now, we remark that 




{[MnUfj, - 1)| > C U {!"»(*;<!) 


jl+1-2 


18\/m 


<?=i<!+2 


> 


men A 




18^/m 


.} 


Let := ^ i^y ^ 33 ^ Lemma 3 we get 

„/2 


PK,£,3] < 


i<+i-2 

2 ^ exp 
9=J>+2 


n^j. 


2E 




j<;-l _ 2 _ 

m ’ m 


< 2(j£+i - j£ - 3)exp - 


+ 


< 2 exp — 


2E 

/2 

n(Pn 


N„ 


jl-^ J>+l-2 
m ’ m 


+ iv^n- 


2E 


N„ 


Jt+l~2 
m ’ m 


+ 


+ log m . 


By (16) in Assumption 4, we get P[A(j ^ 3 ] tends to zero as n tends to infinity. Let us now 

address P[A'^£ 2 ]' Using (31) in Lemma 2 with j = ji and with j = using 

the triangle inequality, it follows that 

ii-i i«-i 

— - 


y~! Ng ^ 

On the event Cn H {j^ > j^}, we get 


I 2 I 


^ (/3o,j,-l,m - %_p™) + ^ \,ji - 1) 


r 2 1 


This implies 

I 3^ j£ — 1 I I A n I ^ ^ I 

I 2 ~ /20,i£-l,m| < + 

Therefore, we may upper bound P[A'^^ 2 ] a-s follows 

UK,£, 2 ] 


- 1 ) 


= P 
= P 

< P 

< P 

< P 


{l4-i.™ - a n c„ n 0, > J,} 

{I 


J£ - Jt-1 I 


— l,m Iv 


> 


iji ~ jl-1 I \/^0,je+i-l,m Po,je-l,r 


■} 


nCr, 


I 1-2- 


- 1 ) 


> 


\h ~ h-1 I l/3o,jf+i-l,m /3o,jf-l,r 


^ 0’^ - jVl) 

I 2 I 


2 ' 3 


na 


6 


+ p 


^ ];j£ - 1) 


> 




r 2 


A.- min A 


j,min^/3,min 


12 


+p 


Mn(r 


6 

jl + ji-l 


]'Ji - 1 ) 


> 




12y4u 


•— + “n,£,2*'^^- 
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We observe that 


“n,£,2 


< jp 


L r 


J£^J£-1 - 


By (17) in Assumption 4, (34) in Lemma 3 with ^ "tICiogm ^ 1])] > it 

follows that 


< 2 exp[- 




2E An 

^ 1 . m i n ^ 




,1 




0 , 


as n —)• cx). By (33) in Lemma 3 with z = and (17) in Assumption 4, we obtain 


a. 


n/, 2 ^^^ < 2 exp 


nz 


2E 


An 


rJl+Jl-l-i 1 . 

I 2 >-l 


+ h' 


as n —)> oo. Therefore, we conclude that P[A'^ ^ 2 ] 0) os ^ 00 . 


B.2 Step II.2. Prove: V[An/ H C^] —?• 0, as n —)■ 00 . 

As in Case I from 8.1, we split V[An,t H C^] into 

p[An,^ n c^] = p[a„,£ n Z1«] + P[An,, n a!"*)] + p[a„,^ n nM], 

Let us first focus on P[An/ n D^^]. Note that 

P[An,, n n {u > k}] < P[^n,£ n n 

Lq—2 

+ Y, Plc^nBi+ipCZii"*)]. 

i=i+i 


Let us now prove that the first term in the right hand side of (B.l) goes to zero as n tends 
to infinity. Using (31) in Lemma 2 with j = [^l+i+Al a,nd j = j£+i, on the first hand 
and (31) in Lemma 2 with j = + 1 and with j = on the other hand, we obtain, 

respectively 


iJ£+i - Ji, 


31+1 


-l,m /3o,jf+i-l,m| < 


+ |\/mA4([^^^±Ulj^]; _ 1)1^ 


(B.2) 


and 


\k+2 k+l ji+i + l,jt+2 

+ \y/mMn{j^+l + l;j>+2 - 1)1- 


(B.3) 
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In addition, we have 

l/3o ,jl+ 2 -l,m ■ - f3o 

~ \ ~ l^0,je+i-l,m) — “ Po,je+ 2 -l,n 


W 


< 




I Je+i-Ji I 

I 2 I 


+ 


I Ji+i—Jt I 

I 2 I 


+ 


^i£+i+i.i ^+2 |ymM„(j£+i + l-,je +2 - 1)1 


\je+2 - je+i - 21 




\je +2 - je+i - 21 

-1^ 

2 


*j,mm 


A 


‘j,min 


^je+i+hje+2 |\/mM^(j£+i + 1; j^+2 - 1) 


IA ■ ■ — 2l 


IA ■ • — 2l 


Define the event E'^ ^ by 




2h)|-j>+i+j>-i 

r-2-1 


+ 


6w 


'ji+i+^di+2 


+ 


+ 


^j,min 

\6^/mMn{je+l + 1; j^+2 - 1)1 




E!^ ^ occurs with probability one. Therefore, we obtain 

< n 5,+!,, n D)™) n 


< P 
+ P 
+ P 
+ P 


r^e+l+^ii .• 

L I ^"^2— \’Je+i 


> 


^if+i+iih+2 — 


^j,min|/doj>+ 2 -l."i Po,jt+i-l,r. 


,je+2-l,m /^Ojf+1-1,7: 

24 


I .• 1M \ A |/30jf+2-l,m /30ji;+i-l,m| 

I-'“nil r) 1)1 ^ Aj^min 


|-^n(j>+l + l;j£+2 - 1)1 > A 


S-y/m 

l/?o ,jt+2-^,m ■ - /3o 


‘j,min 


24:y/m 


/ 2 + j'.a + 


• ^n,£,l ^n,£,2 "i” ^n,£,3 ^n,£,4* 

By (33)-(34) in Lemma 3, and (16)-(17) in Assumption 4, we show that for s = 1,..., 4, ^ ^ 


0 , PK,£nBf+i,^nD, 


(m)i 


0, as n —)■ oo. Recall that in Case I from Section 8.1, we proved 


P[A„^f n —)• 0, as n —>■ oo and in a similar way P[A„^£ n —)• 0, as n —)> oo. This 


concludes the proof of Theorem 3. 


□ 
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